计算机与现代化 ›› 2013, Vol. 1 ›› Issue (4): 27-30,3.doi: 10.3969/j.issn.1006-2475.2013.04.007

• 人工智能 • 上一篇    下一篇

主题爬虫相关度算法研究综述

王 帅,周国民,王 健   

  1. 中国农业科学院农业信息研究所,北京 100081
  • 收稿日期:2013-01-07 修回日期:1900-01-01 出版日期:2013-04-17 发布日期:2013-04-17

Reviews of Relevance Algorithm in Focused Crawler

WANG Shuai, ZHOU Guo-min, WANG Jian   

  1. Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
  • Received:2013-01-07 Revised:1900-01-01 Online:2013-04-17 Published:2013-04-17

摘要: 首先阐述主题爬虫相关度算法目标和相关度的计算内涵;然后根据信息处理的进化观点,以信息特征项的处理为线索,分别从字符层、语言层、语义层3个层次系统分析当前主题爬虫相关度的计算方法,并比较不同层次间各个算法的优缺点;最后总结现有的研究成果,并给出进一步的研究方向。

关键词: 相关度, 算法, 主题爬虫, 概念

Abstract: This paper describes the goal of relevance algorithm and relevance calculation connotation in focused crawler. Then, according to the evolutionary point of view of information processing, it systematically analyzes the current relevance calculation method of focused crawler in three levels: character layer, language layer, semantic layer, and compares the advantages/disadvantages among algorithms from different levels. Finally, it summarizes the current research results and indicates the direction in future works.

Key words: relevance, algorithm, focused crawler, concept

中图分类号: